Semantic Unit Mining and Embedding
نویسندگان
چکیده
Semantic units, meaning words or phrases with their topics and named entities with their specific types, are the basis of the human cognitive system. It can be argued that a large part of semantic units is not just a simple single word. However, most modern NLP applications simply use single word as the basic units to process our language, which may cause the loss of much valuable semantic meanings. Thus, how to efficiently find and generate semantic units in large corpus becomes a significant challenge. In this paper, we propose a method of learning embedding representation for semantic units. We first use phrase mining, name entity recognition, topic modeling to find semantic units. Then we chunk the training corpus into semantic tokens and use the Skip-gram model to train the embedding representation. Our experiments show that semantic unit based embedding outperforms word level based embedding on multiple tasks.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملCommon Space Embedding of Primal-Dual Relation Semantic Spaces
Explicit continuous vector representation such as vector representation of words, phrases, etc. has been proven effective for various NLP tasks. This paper proposes a novel method of constructing such vector representation for both entity-pairs and relation expressions which link them in text. Based on the insight of the duality of relations, the representation is constructed by embedding of tw...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کامل